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ECC BASED SYSTEM AND METHOD FOR REPAIRING FAILED MEMORY ELEMENTS 


FIELD OF THE INVENTION 
The present invention relates to dynamic random access memories (DRAMs) including 
DRAMs embedded in multi-purpose integrated circuits. More specifically, the invention relates to 
a system and method for repairing a foiled memofy element. 

BACKGROUND OF THE INVENTION 

Recently, designers of ASICs (Application Specific Integrated Circuits) have expressed 
interest in incorporating DRAM macros to enhance on-chip storage density. Greater processing 
widths and speed available now to ASICs is beginning to demand storage densities which stretch 
the limits of static random access memory (SRAM), which has traditionally been embedded in 
those ASICs which include a processor element such as a microprocessor. 

Owing to their diverse circuit implementations and design goals, production verification 
testing of ASICs and DRAMs has differed widely. Traditional ASICs, having mainly logic circuits 
such as for a microprocessor and SRAM elements, are production tested in only a few minutes, 
because failures are manifested by relatively large defect currents (firom several to several hundred 
microamps (> 1 x lOE-6), which are manifested either before or after very short durations of high 
stress testing. On the other hand, it may take tens of hours of bum-in testing to manifest all early 
life failures within a DRAM because of much smaller tolerances for defect currents, which 
typically measure in sub-picoamps (< 1 x lOE-12). Moreover, since DRAMs typically have 
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greater integration density than logic circuits, the defect density is greater. As illustrated in Fig. 1, 
it has been noted that the defect density (the number of defects within a given volume) in an 
integrated circuit lies in inverse relation to the cube of the size of the defective element. For 
example, assuming an element size in a DRAM which is one half the element size used in a 
microprocessor, the defect density of the DRAM is expected to be more than eight times the 
defect density of the microprocessor. 

Thus, when a DRAM macro is embedded into an ASIC, a problem is presented for 
production verifying the completed mtegrated drcuit. If traditional ASIC test methods are used 
which are short in duration but at high stress, many marginal DRAM memory elements will not be 
identified at time of test. Instead, such marginal DRAM elements will only fail later once the 
ASIC is packaged and used in the final product by the end user. However, if traditional DRAM 
test methods are used which have long duration, this poses a major disruption to standard ASIC 
test and reliability screening processes. 

Table 1 below indicates failure modes for elements within a DRAM, and the fi^equencies 
with which they are manifested throu^ bum-in testing of each integrated circuit. 


Table 1 


Array Failure 

Frequency 

Root Cause 

No. of Lost Bits 

Single cell failure 

98% 

Crystal defect 
Oxide defect 

1 

Paired cell failure 

1% 

Contact defect 

2 

Wordline x Bitline 

<1% 

Contact-WL short 

>2 

Partial bitline or 
Partial wordlme 
failure 

<1% 

Open metal 

>2 

Full bitline or 

Full wordline failure 

«1% 

miscellaneous 

256 to 4096 bits 
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It is apparent from the above and Table 1 that ASICs which incorporate a DRAM will 
perform poorly unless provision is made for the DRAM early life failures. From Table 1, it is 
apparent that the majority of early life failures affects a single bit memory cell. However, a 
significant amount of early life failures aflfect multiple memory cells such as partial or foil bitline or 
5 wordline failures. Since currently practiced long duration DRAM testing is undesirable with 
present ASICs testing, an alternative approach is needed for handling DRAM early life failures 
while meeting reliability goals in the final product. 

Figure 2 is a block diagram illustrating the structure and operation of a conventional 
DRAM, and is provided and described here as background to the present mvention. A DRAM 
10 u may be "standalone", i.e. the only circuitry on an int^ated circuit, or it may be "embedded", i.e. 
O incorporated as a memory along with a logic core or microprocessor on mtegrated circuits such as 

^! ASICs. As shown in Fig. 2, a DRAM typically includes a number of banks, shown here as 

■ 

£ Bank<0:3>. Each bank contains an array of DRAM memory cells 201, each memory cell which 
M lies at the intersection between a wordline 203 and a bitline 205. By way of example, read access 
15 nj to a memory cell 201 is performed in the following manner. Address (ADDR), and commands for 
bank select, row command (Row Cmd) and column command (Col Cmd) are presented to a row 
control unit (Row Cntl 207) and column control unit (Col Cntl 209). From these signals, row 
decoder (Row Dec. 21 1) selects and activates a wordline 203. The activated wordline 203 causes 
information stored in all memory cells coupled to that wordline 203 to be placed on respective 
20 bitlines, including the information from memory cell 201 which is placed on bitline 205. The 

retrieved information from the memory cells are then transferred to sense amplifiers 213, but in 
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typical DRAMs only a fraction, e.g. one fourth, of the bits accessed by the wordline 203 are 
selected and output onto a databus DQ 217 by column decoders 215, column selection being 
based on the column address portion of the address ADDR, 


SUMMARY OF THE INVENTION 


Accordingly, in the present invention, a word including data bits and check bits are stored 
in addressed locations of a memory within an integrated circuit. Preferably, the memory is a 
DRAM, but the invention could also be implemented for another memory type such as SRAM 
and/or electrically erasable programmable read only memory (EEPROM), indudmg flash memory. 
On read access, the data bits and check bits are retrieved and processed to verify the retrieved 
data bits, and to detect and/or correct any bit errors therein. The verified data bits are then output 
onto a data bus within the ASIC. When a single bit correction, double error detection code 
(SEC/DED) is used, errors due to single cell failures are corrected. Therefore, the on-board error 
correction capability within the integrated ckcuit memory folly corrects for single cell failures, in 
almost all cases. 

However, since other failure modes may appear within the memory, the data and check 
bits are forther processed within the integrated circuit, as by reference to the error correction code 
(ECC) syndrome string for the retrieved bits, to determine the locations of memory failures. At 
this stage of processing, the locations of memory failures are automatically identified and recorded 
in terms of bit location within the array, identified by row address, syndrome string, and column 
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address infonnation as needed. 


At some appropriate time, which can be ongoing during operation, or scheduled upon a 


power-down or power-up sequence, failure patterns are automatically identified by first logic 


circuits based on the recorded locations. Examples of failure patterns include partial or full 


5 wordline failure and full or partial bitline failure. Based on the identified failure patterns, logic 


circuits within the integrated circuit then automatically replace one or more failed memory 


elements, e.g. single cell failures, fiiU or partial wordline failures, full or partial bitline fdlures, 


etc., with redundancy elements such as partial or fidl wordlme redundacny, and partial or full 


bitline redundancy, among others. 


10 a BRIEF DESaOPTION OF THE imAWB^ 


Fig, 1 shows a plot of defect density versus defect size characteristic of an integrated 


M= circuit memory such as a DRAM. 

nJ Fig. 2 is a prior art block diagram illustrating the structure and operation of a DRAM. 

Q Fig 3 ig a block diagram illustrating structure and operation according to a preferred 


1 5 embodiment of the present invention. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Fig. 3 illustrates a preferred arrangement for implementing the present invention in the 
context of a DRAM, although the invention could be implemented with another memory type 
such as SRAM or EEPROM. As shown in Fig. 3, a DRAH either standalone or preferably 
embedded within an ASIC, includes a Bank<0> of memory cells arranged within an array 301 . 
Memory cells within array 301 are accessed by an address and commands in the manner described 
above relative to Fig. 2. Circuitry which selects row, column, and bank, as well as associated 
control circuitry and sense amplifiers are not shown in Fig. 3 but are understood to be present, as 
described above relative to Fig. 2. During a read operation, retrieved information firom a row or 
partial row of memory cells in array 301 is output from sense amplifiers (not shown) associated 
with Bank<0> as a data word 303 to an ECC system 305. Typically, a row of memory cells is 
accessed by a wordline using a row address. A plurality of data words 303 are stored within the 
row. For example, a row which stores 5 12 data bits can contmn four data words 303, each having 
128 data bits. In such case, two bits of the column address are needed to uniquely identify which 
of the four data words 303 is to be accessed fi'om the row. 

The ECC system 305 includes circuits which process a set of data bits and check bits 
within the data word 303 to output a verified or corrected data string 307 to data bus DQ 308. 
The ECC system also outputs a syndrome string 309 to a column address and syndrome string 
register 311, which register 3 1 1 is intended to be understood broadly as being of any type suitable 
for short term storage such as a bujBfer, cache or address range within a RAM, without limitation. 
As indicated above, the error correction code (ECC) preferably enables single bit correction, 
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double error detection (SEC/DED), and the invention will be further described in relation thereto. 
An example of an on-chip ECC system which provides for SEC/DED correction is described in 
U.S. Patent No. 2,571,3 17. DRAMs embedded into ASICs typically employ wide data paths of 
128 bits or greater. The actual overfiead in terms of number of check bits per data bits to 
5 hnplement SEC/DED correction decreases as the data path is designed to be wider. For example, 
it takes 8 check bits to implement SEC/DED on a 64 bit data path for a data word 303 of 72 bits 
in total width. However, when the data path is increased to 128 bit data width, only 9 check bits 
are needed for SEC/DED for a data word 303 of 137 bits in total width. 

When only a single bit eaor is present, the syndrome string 309 indicates the precise bit 
10 H position of the error within the data word 303. When the row of the array 301 contains only one 
O data word 303, the row address 3 17 and the syndrome string 309 uniquely identify the location of 

a memory failure within the array 301 which affects at least one memory cell However, as 
,g described above, for a memory in which four data words 303 are stored in a row, two column 
H address bits are needed to identify the particular data word 303 within the row. Therefore, for a 
1 5 general case in which a plurality of data words 303 are stored in each row, the row address 317, 
2 the syndrome string 309 and some portion of the column address are needed to uniquely identify 
the location of a memory feilure within array 301 . This embodiment will be forther described in 
relation to such general case. 

When more than one bit error is present, for example, two bit errors, the syndrome string 
20 309 for such SEC/DED ECC sometimes correctly indicates the position of the errors within the 
data word, but more often only mdicates that the data word is flawed, but not otherwise 
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correctable. In case of either a single bit error or multiple bit errors in data word 303, an error flag 
3 1 3 is output from ECC system 305 to row register 315, in response to which row register 315 
records the current row address 3 17. The error flag 3 13 is also output to column address and 
syndrome string register 3 1 1, in response to which the current syndrome string 309 and column 
5 address information is stored in register 311, thus identifying the location of a memory failure 
withm the array 301. The recorded location identifies a memory Mure which affects at least one 
memory cell. 

Repair system 321 mcludes control block 323, a block 325 of row redundancy elements 
and a block 327 of column redundancy elements. A primary function of the repair system 321 is to 
10 M provide alternate storage locations when a single memory cell or larger memory elements such as 

O partial or fiill bitline, partial or fall wordline, are identified as fmled. The existence of such 

ijj 

^ alternate storage locations and the method of access thereto are unknown to the entity, e.g. 

SI 

^ microprocessor element of the ASIC, that requests storage access. Thus, the control block 323 
operates upon an incoming address 329 to steer access to a row redundancy element or to a 

O 

1 5 ry column redimdancy element of blocks 325, 327 after a foiled row element or column element of 
array 301 has been replaced with the row or column redundancy element. When a particular row 
or column element of array 301 has not failed, memory access is provided from array 301 in usual 
manner rather than to row or column redundancy elements of blocks 325, 327, . It will be 
understood that address 329 is also provided to row decoder and column decoder (shown, for 

20 example in Fig. 1) associated with array 301, and additional circuitry (also not shown) will be 

present which selects either the output of array 301 or of a row or column redundancy element of 
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blocks 325, 327 as input to ECC system 305, depending upon the status of an address as reflected 
in control block 323. It will be understood that row redundancy elements of block 325 and 
column redundancy elements of block 327 can be located within one or more redundancy arrays 
on the mtegrated circuit which are separately located from array 301. Alternatively, such row and 
column redundancy elements of blocks 325, 327 can be incorporated as normally unused elements 
of array 301, put m use only after redundancy replacement is made. 

Control block 323 also has the function of allocating individual row and/or column 
redundancy elements from blocks 325, 327 to replace failed memory elements such as row or 
column elements withm array 301. Control block 323 generates signals which electrically alter 
circuit connections, as by electronic ftises or antifiises, for example, to permanently steer access to 
certain row and/or column redundancy elements within blocks 325, 327 when the incoming 
address 329 points to a failed row element and/or colunm element of array 301. A exemplary 
description of electronic fuses and operation is described in co-owned U.S. Patent Application 
Serial No, 09/466,479 filed December 17, 1999 entitled "Methods and Apparatus for Blowing 
and Sensing Antifuses", having ffiM Docket No. BUR9-1999-0038US1, Electronic fuses and/or 
antifuses which are permanently ahered by control block 323 cause row or column redundancy 
elements within blocks 325 or 327 to be accessed in place of failing row or column elements 
within array 301 upon receipt of certain addresses which point to the failing elements within array 
301. 

Control block 323 forther receives the following inputs: a row address retrieved from row 
address register 315, and the syndrome string 309 and column address information retrieved from 
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register 3 1 1. As will be described more fully below, the row address, the syndrome string and 
column address information are mputs to a failure identification process which uses these inputs to 
identify failure patterns and determine the address of one or more failed partial or fiill row or 
column elements within array 301 which are candidates for replacement by redundancy elements. 
5 The process, which may be implemented, for example, in hard-coded form, e,g. state machine, or 
as a programmed process, may be designed to execute within control block 323 or, alternatively, 
within one or more processor elements within the same integrated circuit which have an 
input/output connection to control block 323. Preferably, the failure identification process forms a 
part of a built-in-self-test ^IST) feature of the integrated circuit. Based on the identified failure 
10 M= patterns, a redundancy allocation process of control block 323 allocates a row or column 

S redundancy element fi-om blocks 325 and 327 in place of a fmled memory element of array 301 . 

W 

p Continuing the redundancy allocation process, control block 323 generates signals, which are 

H 

£ used to electrically alter certain circuit connections, as by electronic fuse or antifiise. Future 
memory access to the failed memory element is then provided to a row or column redundancy 

15 ^ element within blocks 325 or 327 in place of the origmally accessed row or cohmm dement within 

Ul 

rf array 301, 

In a first preferred embodiment, the failure identification and redundancy allocation 
processes operate during a normal operational mode of the integrated circuit. In such case, the 
integrated circuit services requests in normal manner except as to memory access requests to the 
20 particular memory bank<0> in which the failure identification and redxmdancy allocation 
processed is being performed at the time. In such case, one or more control signals can be 
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asserted as a busy signal on a handshaking bus or master bus or as signals to logic elements of the 
integrated circuit, such as an ASIC processor In this manner, memory access requests made to 
the integrated circuit by other system level elements would not be dropped, which could lead to a 
system level fault. 

In another preferred embodiment, the failure identification and redundancy allocation 
processes operate during a power-down sequence or during a power-up sequence, or both. Such 
processes could be triggered by available processor commands for power-down or power-up 
operation, and be completed at a time when memory availability is not required. Such power- 
down and power-up sequences may be triggered as part of the system's energy saving modes of 
operation. Since the failure identification and redundancy allocation processes are normally 
expected to involve only several microseconds delay, they would not be noticeable to the system 
end-user. 

The operation of a preferred embodiment will now be desaibed, again relative to Fig. 3. 
An mtegrated circuit incorporating a DRAM according to the preferred embodiment can be 
production tested using only a short duration bum-in process such as that which is common for 
logic or processor ASICs. Afl:er such testing, the DRAM would be expected to retain a number of 
memory fmlures which are possibly unidentified. Memory access to the DRAM is provided on a 
bank by bank basis, as illustrated in Fig. 3. Within a bank, e.g. Bank<0>, a data word 303 
containing both data bits and check bits is output as a data word 303 to ECC system 305. Based 
on data word 303, the ECC system 305 verifies and/or corrects the data bits and outputs them as 
a corrected data string 307 to a DQ bus 308, which then transfers the data bits to a requestmg 
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element of the integrated circuit such as a processor. By use of a SEC/DED ECC, single bit errors 
are corrected, such that the vast majority of smgle memory cell Mures are corrected. 

The ECC system 305 also outputs a syndrome string 309 to syndrome string register 3 1 1 
which indicates the bit position of an error when present within data word 303. An error flag 313 
5 is provided to syndrome string register 3 11 and row register 3 15 to signal presence of one or 
more errors, which signals these registers to store the current row address, the syndrome string 
and column address information, thereby uniquely identifying the location of a memory failure. 

At an appropriate point in time, which may be, for example, during a normal operational 
mode or during a power-up or power-down sequence, the information logged in row register 315 
1 0 H and column address and syndrome string register 3 11 are used by a failure identification process 
fi to identify failure patterns with array 301 of Bank<0>. Failure patterns, such as failures of a 

m 

partial row or whole row, partial column or whole column are identified by this process. Based on 
«p the identified failure patterns, allocation of redundancy elements is made fi-om row and column 

IS 

redundancy blocks 325, 327, and circuit connections are electrically altered by activating elements 
15 ^ such as electronic fuse or antifuses, such that access to a failing element of memory array 301 is 

thereafter made instead to the allocated row redundancy element or column redundancy element. 
Among the advantages, when the preferred embodunent of the invention is incorporated 

into an ASIC, is greater system reliability. The operation of the ECC to fix correctable errors (e.g. 

single bit errors) and check for multiple bit errors in retrieved data words is a scrubbing process 
20 that continues throughout the lifetime of the ASIC. In addition, another improvement to system 

reliability is seen in the provision on board the ASIC to identify failure patterns, to allocate 
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redundancy elements in response thereto, and to electrically alter circuit connections to replace 
failed memory elements with redundancy elements, all while the ASIC remains installed normally 
in a system. Moreover, as ASIC operating temperatures are expected to be 10 to 20 degrees C 
hotter than stand alone DRAM ICs, and leakage currents from DRAM memory cells typically 
double for every ten degrees C rise, the ECC also enhances the base reliability of the embedded 
DRAM by scrubbing intermittent failures associated with such leakage currents that appear at 
high operating temperatures. 

Another alternative embodiment of the invention is now described, with referrace 
generally to Fig. 3, but pomting out specific differences thereto. In the alternative embodiment, 
the data bits portion of data word 303 is transferred directly from array 301 to DQ bus 308, in 
parallel with the transfer of data word 303 to ECC system 305. In such arrangement, the ECC 
system 305 provides an enable flag to DQ bus 308 if no error is detected. Final activation of data 
drivers of DQ bus 308 are then conditioned on receipt of the enable flag. However, when ECC 
system 305 detects an error, the enable flag is not present. The data drivers are then held in a tri- 
state mode until the ECC system 305 provides corrected data, the corrected data is reloaded into 
data latches of DQ bus 308 and the ECC system 305 reactivates the data drivers with the enable 
flag. 

While the invention has been described in relation to certain preferred embodiments 
thereof, those skilled in the art will recognize the many modifications and enhancements which 
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can be made without departing from the true scope and s^t of the mvaition, as limited only by 
the claims appended below. 
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